Don't double-interrupt the test HTTP server #197

headius · 2024-12-10T05:49:40Z

The shutdown process here attempted to terminate the test server by interrupting it with Thread#kill, and then proceeded to close the server and join the thread. The kill does indeed interrupt the accept call, but the close call could also interrupt the thread as part of notifying blocked threads waiting on that socket call.

In JRuby, where all of this can happen at the same time, it leads to the following scenario:

The server thread enters TCPServer#accept and blocks.
The main thread calls Thread#kill to interrupt the accept call.
The server thread wakes up and starts to propagate the kill. There is a slight delay between this wakeup and removing the server thread from the TCPServer's blocked threads list.
The main thread calls TCPServer#close, which sees that the server thread is still in the blocked list, so it initiates a second interrupt to raise IOError "closed in another thread" on the server thread.
As the kill is bubbling out, another check for interrupts occurs, causing it to see the new raise interrupt and propagate that instead of the active kill.
Because the server is now closed and the rescue here is empty, the server loop will endlessly attempt and fail to call accept.

I was unable to determine how CRuby avoids this race. There may be code that prevents an active kill interrupt from triggering further interrupts.

In order to get these tests running on JRuby, I've made the following changes:

Only kill the thread; one interrupt is sufficient to break it out of the accept call.
Ensure outside the server loop that the server gets closed. This happens within the server thread, so triggers no new interrupts.
Minor cleanup for the pattern of using @ssl_server or @server.

This change avoids the race in JRuby (and possibly other parallel- threaded implementations) and does not impact the behavior of the tests.

@server

The shutdown process here attempted to terminate the test server by interrupting it with Thread#kill, and then proceeded to close the server and join the thread. The kill does indeed interrupt the accept call, but the close call could also interrupt the thread as part of notifying blocked threads waiting on that socket call. In JRuby, where all of this can happen at the same time, it leads to the following scenario: * The server thread enters TCPServer#accept and blocks. * The main thread calls Thread#kill to interrupt the accept call. * The server thread wakes up and starts to propagate the kill. There is a slight delay between this wakeup and removing the server thread from the TCPServer's blocked threads list. * The main thread calls TCPServer#close, which sees that the server thread is still in the blocked list, so it initiates a second interrupt to raise IOError "closed in another thread" on the server thread. * As the kill is bubbling out, another check for interrupts occurs, causing it to see the new raise interrupt and propagate that instead of the active kill. * Because the server is now closed and the rescue here is empty, the server loop will endlessly attempt and fail to call accept. I was unable to determine how CRuby avoids this race. There may be code that prevents an active kill interrupt from triggering further interrupts. In order to get these tests running on JRuby, I've made the following changes: * Only kill the thread; one interrupt is sufficient to break it out of the accept call. * Ensure outside the server loop that the server gets closed. This happens within the server thread, so triggers no new interrupts. * Minor cleanup for the pattern of using @ssl_server or @server. This change avoids the race in JRuby (and possibly other parallel- threaded implementations) and does not impact the behavior of the tests.

headius · 2024-12-10T05:50:46Z

@hsbt @nobu @ko1 I would appreciate some guidance on the interrupt questions here. Does CRuby have this race when multiple interrupts arrive in rapid succession? If not, how does it avoid an active "kill" interrupt getting overtaken by a second "raise" interrupt happening in parallel?

Copied from my PR to fix this double-interrupt race. See ruby/net-http#197

headius · 2024-12-10T06:14:48Z

I have described the issue in more detail, with a simulated case that also triggers on CRuby, in this JRuby bug: jruby/jruby#8501

I think my main question here is: should a thread that is dying due to a "kill" be able to receive "raise" interrupts? And vice-versa?

Whether JRuby has an actual bug or not (perhaps we should try to make sure the killed thread does not receive the IOError raise?) I want to know the answer to these questions.

This is the same situation as ruby/net-http#197, where a server thread is interrupted twice and ends up in a loop. This is more fallout from removing WEBrick and replacing its use in tests with small hand-written server code that apparently exposes a behavior difference in JRuby.

headius · 2024-12-27T21:51:47Z

@hsbt I ran into another example of this "double interrupt" test server code in open-uri's test utils: jruby/jruby@a355cd7

I'm not convinced yet that there's a bug in JRuby, but in any case, the new code is reliable on both CRuby and JRuby and allows the tests to complete in our CI. Hopefully we can have a chat about these changes soon.

ioquatix · 2025-01-18T23:35:21Z

Why not server = @ssl_server || @server and be done with all the conditionals later on?

headius · 2025-01-20T22:50:51Z

@ioquatix also good!

headius requested review from hsbt and nobu December 10, 2024 05:49

headius added a commit to headius/jruby that referenced this pull request Dec 10, 2024

Don't double-interrupt the test HTTP server

f07fc25

Copied from my PR to fix this double-interrupt race. See ruby/net-http#197

headius mentioned this pull request Dec 10, 2024

Thread kill can be interrupted by another interrupt jruby/jruby#8501

Open

hsbt merged commit e18aa61 into ruby:master Dec 31, 2024
22 checks passed

ioquatix mentioned this pull request Jan 18, 2025

UDPSocket#close hangs socketry/async#368

Open

headius deleted the no_double_interrupt_test_server branch January 20, 2025 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't double-interrupt the test HTTP server #197

Don't double-interrupt the test HTTP server #197

Uh oh!

headius commented Dec 10, 2024

Uh oh!

headius commented Dec 10, 2024

Uh oh!

headius commented Dec 10, 2024

Uh oh!

headius commented Dec 27, 2024

Uh oh!

Uh oh!

ioquatix commented Jan 18, 2025

Uh oh!

headius commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Don't double-interrupt the test HTTP server #197

Don't double-interrupt the test HTTP server #197

Uh oh!

Conversation

headius commented Dec 10, 2024

Uh oh!

headius commented Dec 10, 2024

Uh oh!

headius commented Dec 10, 2024

Uh oh!

headius commented Dec 27, 2024

Uh oh!

Uh oh!

ioquatix commented Jan 18, 2025

Uh oh!

headius commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants